{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Plumbing: A look under the hood of ``gluon``\n",
    "\n",
    "In the previous tutorials, we taught you about linear regression and softmax regression. We explained how these models work in principle, showed you how to implement them from scratch, and presented a compact implementation using ``gluon``. And since our focus was on modeling, we showed \n",
    "\n",
    "We explained *how to do things* in ``gluon`` but didn't really explain *how they work*. We relied on ``nn.Sequential``, syntactically convenient shorthand for ``nn.Block`` but didn't peek under the hood.  And while each notebook presented a working, trained model, we didn't show you how to introspect its parameters, save and load models, etc. In this chapter, we'll take a break from modeling to explore the gory details of ``mxnet.gluon``."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load up the data\n",
    "First, let's get the preliminaries out of the way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "import mxnet as mx\n",
    "import numpy as np\n",
    "from mxnet import nd, autograd, gluon\n",
    "from mxnet.gluon import nn, Block\n",
    "\n",
    "###########################\n",
    "#  Speficy the context we'll be using\n",
    "###########################\n",
    "ctx = mx.cpu()\n",
    "\n",
    "###########################\n",
    "#  Load up our dataset\n",
    "###########################\n",
    "batch_size = 64\n",
    "def transform(data, label):\n",
    "    return data.astype(np.float32)/255, label.astype(np.float32)\n",
    "train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform),\n",
    "                                      batch_size, shuffle=True)\n",
    "test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),\n",
    "                                     batch_size, shuffle=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Composing networks with ``gluon.Block``\n",
    "Now you might remember that up until now, we've defined neural networks (for, example a multilayer perceptron) like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "net1 = gluon.nn.Sequential()\n",
    "with net1.name_scope():\n",
    "    net1.add(gluon.nn.Dense(128, activation=\"relu\"))\n",
    "    net1.add(gluon.nn.Dense(64, activation=\"relu\"))\n",
    "    net1.add(gluon.nn.Dense(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a convenient shorthand \n",
    "that allows us to express neural network compactly. \n",
    "When we want to build simple networks, \n",
    "this saves us a lot of time. \n",
    "But both (i) to understand how ``nn.Sequential`` works, \n",
    "and (ii) to compose more complex architectures, \n",
    "you'll want to understand ``gluon.Block``.\n",
    "\n",
    "Let's take a look at the same model would be expressed with ``gluon.Block``."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class MLP(Block):\n",
    "    def __init__(self, **kwargs):\n",
    "        super(MLP, self).__init__(**kwargs)\n",
    "        with self.name_scope():\n",
    "            self.dense0 = nn.Dense(128)\n",
    "            self.dense1 = nn.Dense(64)\n",
    "            self.dense2 = nn.Dense(10)\n",
    "        \n",
    "    def forward(self, x):\n",
    "        x = nd.relu(self.dense0(x))\n",
    "        x = nd.relu(self.dense1(x))\n",
    "        return self.dense2(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we've defined a class for MLPs, we can go ahead and instantiate one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "net2 = MLP()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And initialize its parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "net2.initialize(ctx=ctx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point we can pass data through the network by calling it like a function, just as we have in the previous tutorials. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[ 0.02268829  0.00568802 -0.05402809  0.00068389 -0.02422323  0.04621337\n",
       "  -0.03124876 -0.00491598 -0.03984836  0.00056135]]\n",
       "<NDArray 1x10 @cpu(0)>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "for data, _ in train_data:\n",
    "    data = data.as_in_context(ctx)\n",
    "    break\n",
    "net2(data[0:1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calling ``Block`` as a function\n",
    "\n",
    "Notice that ``MLP`` is a class and thus its instantiation, ``net2``, is an object. If you're a casual Python user, you might be surprised to see that we can *call an object as a function*. This is a syntactic convenience owing to Python's `__call__` method. Basically, ``gluon.Block.__call__(x)`` is defined so that ``net(data)`` behaves identically to ``net.forward(data)``. Since passing data through models is so fundamental and common, you'll be glad to save these 8 characters many times per day."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## So what is a ``Block``?\n",
    "\n",
    "In ``gluon``, a ``Block`` is a generic component in a neural network. The entire network is a ``Block``, each layer is a ``Block``, and we can even have repeating sequences of layers that form an intermediate ``Block``.\n",
    "\n",
    "This might sounds confusing, so let's make it simple. Each neural network has to do the following things:\n",
    "1. Store parameters\n",
    "1. Accept inputs\n",
    "1. Produce outputs (the forward pass)\n",
    "1. Take derivatives (the backward pass)\n",
    "\n",
    "This can be said for the network as a whole, but it can also be said of each individual layer. A single fully-connected layer is parametrized by weight matrix and bias vector, produces outputs from inputs, and can given the derivative of some objective with respect to its outputs, can calculate the derivative with respect to its inputs. \n",
    "\n",
    "Fortunately, MXNet can take derivatives automatically. So we only have to define the forward pass (``forward(self, x)``). Then, using ``mxnet.autograd``, ``gluon`` can handle the backward pass. This is quite a powerful interface. For example we could define the forward pass for some component to take multiple inputs, and to combine them in arbitrary ways. We can even compose the ``forward()`` function such that it throws together a different architecture on the fly depending on some conditions that we could specify in Python. As long as the result is an NDArray, we're in the clear.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What's the deal with ``name_scope()``?\n",
    "The next thing you might have noticed is that we added all of our layers inside a ``with net1.name_scope():`` block. This coerces ``gluon`` to give each parameter an appropriate name, indicating which model it belongs to, e.g. ``sequential8_dense2_weight``. Keeping these names straight makes our lives much easier once we start writing more complex code where we might be working with multiple models and saving and loading the parameters of each. It helps us to make sure that we associate each weight with the right model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Demystifying ``nn.Sequential``\n",
    "So Sequential is basically a way of throwing together a Block on the fly. Let's revisit the ``Sequential`` version of our multilayer perceptron."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "net1 = gluon.nn.Sequential()\n",
    "with net1.name_scope():\n",
    "    net1.add(gluon.nn.Dense(128, activation=\"relu\"))\n",
    "    net1.add(gluon.nn.Dense(64, activation=\"relu\"))\n",
    "    net1.add(gluon.nn.Dense(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In just 5 lines and 183 characters, we defined a multilayer perceptron with three fully-connected layers, each parametrized by weight matrix and bias term. We also specified the ReLU activation function for the hidden layers. \n",
    "\n",
    "Sequential itself subclasses ``Block`` and maintains a list of ``_children``. Then, every time we call ``net1.add(...)`` our net simple registers a new child. We can actually pass in arbitrary ``Block``, even layers that we write ourselves. \n",
    "\n",
    "When we call ``forward`` on a ``Sequential``, it executes the following code:\n",
    "\n",
    "```\n",
    "def forward(self, x):\n",
    "    for block in self._children:\n",
    "        x = block(x)\n",
    "    return x\n",
    "```\n",
    "Basically, it calls each child on the output of the previous one, returning the final output at the end of the chain."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Shape inference\n",
    "One of the first things you might notice is that for each layer, we only specified the number of nodes output, we never specified how many input nodes! You might wonder, how does ``gluon`` know that the first weight matrix should be $784 \\times 128$ and not $42 \\times 128$. In fact it doesn't. We can see this by accessing the network's parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sequential1_ (\n",
      "  Parameter sequential1_dense0_weight (shape=(128, 0), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential1_dense0_bias (shape=(128,), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential1_dense1_weight (shape=(64, 0), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential1_dense1_bias (shape=(64,), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential1_dense2_weight (shape=(10, 0), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential1_dense2_bias (shape=(10,), dtype=<type 'numpy.float32'>)\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "print(net1.collect_params())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at the shapes of the weight matrices: (128,0), (64, 0), (10, 0). What does it mean to have zero dimension in a matrix? This is ``gluon``'s way of marking that the shape of these matrices is not yet known. The shape will be inferred on the fly once the network is provided with some input."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So when we initialize our parameters, you might wonder, what precisely is happening?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "net1.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this situation, ``gluon`` is not actually initializing any parameters! Instead, it's making a note of which initializer to associate with each parameter, even though it's shape is not yet known. The parameters are instantiated and the initializer is called once we provide the network with some input."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sequential1_ (\n",
      "  Parameter sequential1_dense0_weight (shape=(128L, 784L), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential1_dense0_bias (shape=(128L,), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential1_dense1_weight (shape=(64L, 128L), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential1_dense1_bias (shape=(64L,), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential1_dense2_weight (shape=(10L, 64L), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential1_dense2_bias (shape=(10L,), dtype=<type 'numpy.float32'>)\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "net1(data)\n",
    "print(net1.collect_params())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This shape inference can be extremely useful at times. For example, when working with convnets, it can be quite a pain to calculate the shape of various hidden layers. It depends on both the kernel size, the number of filters, the stride, and the precise padding scheme used which can vary in subtle ways from library to library."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Specifying shape manually\n",
    "\n",
    "If we want to specify the shape manually, that's always an option. We accomplish this by using the ``in_units`` argument when adding each layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "net2 = gluon.nn.Sequential()\n",
    "with net2.name_scope():\n",
    "    net2.add(gluon.nn.Dense(128, in_units=784, activation=\"relu\"))\n",
    "    net2.add(gluon.nn.Dense(64, in_units=128, activation=\"relu\"))\n",
    "    net2.add(gluon.nn.Dense(10, in_units=64))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the parameters from this network can be initialized before we see any real data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sequential2_ (\n",
      "  Parameter sequential2_dense0_weight (shape=(128, 784), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential2_dense0_bias (shape=(128,), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential2_dense1_weight (shape=(64, 128), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential2_dense1_bias (shape=(64,), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential2_dense2_weight (shape=(10, 64), dtype=<type 'numpy.float32'>)\n",
      "  Parameter sequential2_dense2_bias (shape=(10,), dtype=<type 'numpy.float32'>)\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "net2.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)\n",
    "print(net2.collect_params())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next\n",
    "[Writing custom layers with ``gluon.Block``](../chapter03_deep-neural-networks/custom-layer.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For whinges or inquiries, [open an issue on  GitHub.](https://github.com/zackchase/mxnet-the-straight-dope)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
